The head-modifier principle and multilingual term extraction

نویسندگان

  • Andrew Hippisley
  • David Cheng
  • Khurshid Ahmad
چکیده

Advances in language engineering may be dependent on theoretical principles originating from linguistics, since both share a common object of enquiry, natural language structures. We outline an approach to term extraction that rests on theoretical claims about the structure of words. We use the structural properties of compound words to specifically elicit the sets of terms defined by type hierarchies such as hyponymy and meronymy. The theoretical claims revolve around the head-modifier principle, which determines the formation of a major class of compounds. Significantly it has been suggested that the principle operates in languages other than English. To demonstrate the extendibility of our approach beyond English, we present a case study of term extraction in Chinese, a language whose written form is the vehicle of communication for over 1.3 billion language users, and therefore has great significance for the development of language engineering technologies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality

We introduce a new multilingual resource containing judgments about nominal compound compositionality in English, French and Portuguese. It covers 3 × 180 noun-noun and adjective-noun compounds for which we provide numerical compositionality scores for the head word, for the modifier and for the compound as a whole, along with possible paraphrases. This resource was constructed by native speake...

متن کامل

Extraction of Multilingual Term Variants in the Business Reporting Domain

Within the context of the European research project ”Monnet”, which implements among other activities ontology-based multilingual information extraction, we tackle the the issue of recognizing variants of concept labels in business reports that guide the information extraction process. In this short paper, we describe two related experiments in finding variants of multilingual taxonomy labels u...

متن کامل

Multilingual Deep Bottle Neck Features a Study on Language Selection and Training Techniques

Previous work has shown that training the neural networks for bottle neck feature extraction in a multilingual way can lead to improvements in word error rate and average term weighted value in a telephone key word search task. In this work we conduct a systematic study on a) which multilingual training strategy to employ, b) the effect of language selection and amount of multilingual training ...

متن کامل

Terminological Variation, a Means of Identifying Research Topics from Texts

After extracting terms from a corpus of titles and abstracts in English, syntactic variation relations are identified amongst them in order to detect research topics. Three types of syntactic variations were studied : permutation, expansion and substitution. These syntactic variations yield other relations of formal and conceptual nature. Basing on a distinction of the variation relations accor...

متن کامل

Extraction of the Antimalarial Artemisinin from Artemisia Annua L. Leaves with Supercritical Co2

The antimalarial artemisinin, the active principle of the plant Artemisia annua L., was extracted by Supercritical Fluid Extraction (SFE). Soxhlet extraction was studied as comparison processes. The extracts were analyzed by the indirect method GC-FID. The maximal extraction yield (1.82%DW) was obtained by SFE at T=40°C, P=200bar, CO2flow=4.5g/min, and 20% ethanol as modifier. The artemisinin e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2005